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Abstract. In 1951 Robbins and Monro published the seminal article 
on stochastic approximation and made a specific reference to its ap- 
plication to the "estimation of a quantal using response, nonresponse 
data." Since the 1990s, statistical methodology for dose-finding studies 
has grown into an active area of research. The dose-finding problem 
is at its core a percentile estimation problem and is in line with what 
the Robbins-Monro method sets out to solve. In this light, it is quite 
surprising that the dose-finding literature has developed rather inde- 
pendently of the older stochastic approximation literature. The fact 
that stochastic approximation has seldom been used in actual clinical 
studies stands in stark contrast with its constant application in engi- 
neering and finance. In this article, I explore similarities and differences 
between the dose-finding and the stochastic approximation literatures. 
This review also sheds light on the present and future relevance of 
stochastic approximation to dose-finding clinical trials. Such connec- 
tions will in turn steer dose-finding methodology on a rigorous course 
and extend its ability to handle increasingly complex clinical situations. 

Key words and phrases: Coherence, dichotomized data, discrete bar- 
rier, ethics, indifference interval, maximum likelihood recursion, unbi- 
asedness, virtual observations. 



1. INTRODUCTION 

Dose-finding in phase I clinical trials is typically 
formulated as estimating a prespecified percentile 
of a dose-toxicity curve. That is, the objective is to 
identify a dose 6 such that tt(6) = p, or equivalently, 

(i) e=-*- 1 (p), 

where ir(x) is the probability of toxicity at dose x 
and is assumed continuous and increasing in x. Per- 
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centile estimation, often seen in bioassay, is a well- 
studied problem for which statisticians have an ex- 
tensive set of tools; see the books by Finney (1978) 
and Morgan (1992). There are, however, two practi- 
cal aspects of clinical studies that distinguish phase 
I dose-finding from the classical bioassay problem. 
First, the experimental units are humans. An impli- 
cation is that the subjects should be treated sequen- 
tially with respect to some ethical constraints (e.g., 
Section 2.3.1). As such, dose-finding is as much a de- 
sign problem as an analysis problem. Second, the ac- 
tual doses administered to the subjects are confined 
to a discrete panel of levels, denoted by {d±, . . . , dx}, 
with 7r(di) < ••• < ^{dx)- Therefore, it is possible 
that ir(dk) ^=p for all k, and the working objective 
then is to identify the dose 



argmin |7r(dfc) 
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Apparently, the continuous dose-finding objective 9 
and the discrete objective v are close to each other. 
However, in this article, we will see that discretized 
versions of methods developed for 8 are not neces- 
sarily good solutions for v. This special section of 
Statistical Science also consists of four other articles 
that review some benchmarks in the recent develop- 
ment of the so-called model-based methods for dose- 
finding studies. In a nutshell, a model-based method 
makes dose decisions based on the explicit use of a 
dose-toxicity model. That is, the toxicity probabil- 
ity at dose x, tt(x), is postulated to be F(x,4>o) for 
some true parameter value cj)Q. This is in contrast 
to the class of algorithm-based designs whereby a 
set of dose-escalation rules are prespecified for any 
given dose without regard to the observations at the 
other doses. Section 2 of this article will present a 
brief history of the development of the modern dose- 
finding methods and define the scope of this special 
issue. 

In addition, this article complements the other 
articles in two ways. First, it consolidates the key 
theoretical dose-finding criteria that are otherwise 
scattered in the literature (Section 2.3). Second, it 
compares and contrasts the dose-finding literature 
with the large literature on stochastic approxima- 
tion (Section 3); the former primarily addresses the 
discrete objective u, whereas the latter deals with 
9. While this literature synthesis is of intellectual 
interest, it also sheds light on how we may tailor 
the well-studied stochastic approximation method 
to meet the practical needs in dose-finding studies 
(Section 4). Section 5 will end this article with some 
future directions in dose-finding methodology. 

2. MODERN DOSE-FINDING METHODS 

2.1 A Brief History 

This article uses the work of Storer and DeMets 
(1987) as a historical line to define the modern statist- 
ical literature of dose-finding. Little discussion and 
formal formulation of the dose-finding problem ex- 
isted in the pre-1987 statistical literature; an excep- 
tion was the article by Anbar (1984). While dose- 
finding in cancer trials was discussed as early as in 
the 1960s in the biomedical communities, a well- 
defined quantitative objective such as (2) was ab- 
sent in the communications; see the work of Schnei- 
derman (1965) and Geller (1984), for example. The 
article by Storer and DeMets (1987) is the earliest 
reference, to the best of my knowledge, that engages 



the clinical readership with the idea of percentile 
estimation. The authors point out the arbitrary es- 
timation properties associated with the traditional 
3+3 algorithm used in actual dose-finding studies 
in cancer patients. The 3+3 algorithm identifies the 
so-called maximum tolerated dose (MTD) using the 
following dose-escalation rules after enrolling every 
group of three subjects: let Xj denote the dose given 
to the jth group of subjects and suppose Xj = d k ; 
then 

r 4+1, if z k /n k < 0.33, 

(3) Xj+i = < d k , if z k = 1 and n k = 3, 

Idjt-i, ifz k >2, 
where n k and z k respectively denote the cumulative 
sample size and number of toxicities at dose d k . The 
trial will be terminated once a de-escalation occurs, 
and the next lower dose will be called the MTD. In 
the sequel, Storer (1989) deduced from the 3+3 al- 
gorithm (3) that a cancer dose-finding study aims to 
estimate the 33rd percentile (i.e., p = 0.33). While it 
has now emerged that the target is likely lower than 
the 33rd percentile with p being between 0.16 and 
0.25, their work has shaped the subsequent develop- 
ment of dose-finding methods in both the statistical 
and biomedical literatures, and the MTD has since 
been defined invariably dose associated with a 
prespecified toxicity probability p. 

O'Quigley, Pepe and Fisher (1990) proposed the 
continual reassessment method (CRM) in 1990. The 
CRM is the first model-based method in the modern 
dose-finding literature. The main idea of the method 
is to treat the next subject or group of subjects at 
the dose with toxicity probability estimated to be 
closest to the target p. Precisely, suppose we have 
observations from the first j groups of subjects and 
compute the posterior mean 4>j of (j) given these ob- 
servations. Then the next group of subjects will be 
treated at 

(4) x j+1 =wgmm.\F{d k ,4> j )-p\. 

A similar idea is adopted in most model-based de- 
signs proposed since 1990. One example is the esca- 
lation with overdose control (EWOC) by Babb, Ro- 
gatko and Zacks (1998), who applied the continual 
reassessment notion but estimated the MTD with 
respect to an asymmetric loss function which places 
heavier penalties on overdosing than underdosing. 
O'Quigley and Conaway (2010) and Tighiouart and 
Rogatko (2010) in this special issue review the CRM 
and the EWOC and their respective extensions. An- 
other CRM-like design is the curve-free method by 
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Gasparini and Eisele (2000) who estimated the dose- 
toxicity curve using a Bayesian nonparametric 
method in an attempt to avoid bias due to model 
misspecification. Leung and Wang (2001) proposed 
an analogous frequentist version that uses isotonic 
regression for estimation. Other model-based designs 
include the Bayesian decision-theoretic design 
(Whitehead and Brunier (1995)), the logistic dose- 
ranging strategy (Murphy and Hall (1997)), and 
Bayesian c-optimal design (Haines, Perevozskaya, and 
Rosenberger (2003)). 

The late 1990s saw an increasing interest in algo- 
rithm-based designs. Durham, Flournoy and Rosen- 
berger (1997) proposed a biased coin design by which 
the dose for the next subject is reduced if the current 
subject has a toxic outcome, and the dose is esca- 
lated with a probability p/(l — p) otherwise. The 
biased coin design is a randomized version of the 
Dixon and Mood (1948) up-and-down design. Mo- 
tivated by its similarity to the traditional design, 
Cheung (2007) studied a class of stepwise proce- 
dures that includes (3) as a special case. Yet an- 
other algorithm-based method was proposed by Ji, 
Li, and Bekele (2007) who made interim decisions 
based on the posterior toxicity probability interval 
associated with each dose. The impetus for these 
algorithm-based designs is simplicity: the decision 
rules can be charted prior to the trial, so that the 
clinical investigators know exactly how doses will be 
assigned based on the observed outcomes. 

In order to make dose-finding techniques relevant 
to clinical practice, statisticians have responded to 
the realistically complicated clinical situations such 
as time-to-toxicity endpoints (Cheung and Chap- 
pell (2000)) and combination treatments (Thall et 
al. (2003)). While the core dose-finding objective re- 
mains a percentile estimation problem, the complex- 
ity of dose-finding methods has grown rapidly in the 
literature, with most innovations taking the model- 
based approach. Thall (2010) in this special issue 
will review the major development of these complex 
designs. 

Most (model-based) designs in the literature up 
to this point take the myopic approach by which 
the dose assignment is optimized with respect to the 
next immediate subject without regard to the future 
subjects. Bartroff and Lai (2010) in this issue break 
away from this direction and propose a model-based 
method from an adaptive control perspective. While 
this work attempts to solve a specific Bayesian op- 
timization problem, it also sets a new direction in 



the modern dose-finding techniques; see Section 3 of 
this article. 

2.2 Why Model-Based Now 

A model-based design allows borrowing strength 
from information across doses. This characteristic 
appeals to statisticians and clinicians alike, espe- 
cially because of the typically small-to-moderate sam- 
ple sizes seen in early-phase clinical studies. As clin- 
icians begin to appreciate the crucial role of dose- 
finding in the entire drug development program and 
the value of statistical inputs to reconcile the ethi- 
cal and research aspects in early-phase trials, their 
discussions have revolved around model-based inno- 
vations such as the CRM (Ratain et al. (1993)) and 
the EWOC (Eisenhauer et al. (2000)). The increas- 
ing number of applications in actual trials (Muller 
et al. (2004)) indicates the clinical awareness and 
readiness for these model-based methods. 

When compared to the simplicity of algorithm- 
based methods, the model-based approaches are com- 
putationally complex and require special program- 
ming before and during the implementation of a 
trial. Thanks to the advances of computing algo- 
rithms (e.g., Markov chain Monte Carlo) and com- 
puter technology, however, trial planning with ex- 
tensive simulation has become feasible. This being 
the full-scale dynamic programming can still 

stretch the computing resource; see the article by 
Bartroff and Lai (2010) for some comparison of com- 
putational times. In addition, statistician-friendly 
software has become increasingly available for the 
planning and execution of these model-based de- 
signs, for example, the df crm package in R (Cheung 
(2008)). These indicators of computational matu- 
rity transform the model-based designs into practi- 
cal tools for dose-finding trials. 

Finally, the development of dose-finding theory 
and dose-response models in the past two decades 
lends scientific rigor to the complexity of the model- 
based methods. Indeed, the goal of this special issue 
is to review the theoretical and modeling progress 
made in the modern dose-finding literature, and 
thereby demonstrate the full promise, and perhaps 
challenges, of the model-based methods. 

2.3 Some Theoretical Criteria 

In a typical dose-finding trial, subjects are en- 
rolled in small groups of size m > 1 . The enrollment 
plan is said to be fully sequential when m = 1. Let 
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Xi denote the dose given to the ith group of sub- 
ject^). Thus, the sequence {xi} forms the design of 
a dose-finding study. As most dose-finding methods 
are outcome-adaptive, each design point Xj is ran- 
dom and depends on the previous observation his- 
tory. Evaluation of a dose-finding method therefore 
involves the study of its design space with respect 
to some ethical and estimation criteria. This section 
will review some key dose-finding criteria including 
coherence, rigidity, indifference intervals, and unbi- 
asedness. 

2.3.1 Coherence First, consider fully sequential tri- 
als with m = 1, so that each human subject is an 
experimental unit. An ethical principle, coined co- 
herence by Cheung (2005), dictates that no escala- 
tion should take place for the next enrolled subject 
if the current subject experiences some toxicity, and 
that dose reduction for the next subject is not ap- 
propriate if the current subject has no sign of tox- 
icity. Precisely, let Yj denote the toxicity outcome 
of the ith subject. An escalation for the subject is 
said to be coherent only when Yi-\ = 0; likewise, a 
de-escalation is coherent only when Yi-\ = 1. Ex- 
tending the notion of coherence for each move, one 
can naturally define coherence as a property of a 
dose-finding method: 

Property 1 (Coherence). A dose-finding design 
T> is said to be coherent in escalation if with proba- 
bility 1 

(5) P D (U i >0\Y i - 1 = l) = 

for all i, where Ui = Xi — is the dose increment 
from subject i — 1 to i, and -Pp(-) denotes probabil- 
ity computed under the design V. Analogously, the 
design is said to be coherent in de-escalation if with 
probability 1 

(6) P v (U i <0\Y i . 1 = 0) = 
for all i. 

It is important to note that coherence is motivated 
by ethical concerns, and hence may not correspond 
to efficient estimation of the dose-toxicity curve. For 
example, in bioassay, an efficient design obtained by 
sequentially maximizing some function of the infor- 
mation may induce incoherent moves, and thus is 
not appropriate for human trials; see the work of 
McLeish and Tosh (1990), for example. 

An algorithm-based design can explicitly incorpo- 
rate dose decision rules that respect the coherence 



principles; see the biased coin design. For a model- 
based design, on the other hand, it is not imme- 
diately clear whether coherence necessarily holds. 
There are three general ways to ensure coherence in 
practice. First, one could adopt model-based meth- 
ods that have been proven coherent analytically. This 
includes the one-stage Bayesian CRM. Second, one 
could take a numerical approach. Let N denote the 
sample size of a trial. Then the design space is com- 
pletely generated by the first N — 1 binary toxic- 
ity observations, and thus consists of 2 N ~ l possible 
design outcomes. Therefore, one could establish co- 
herence (for a given N) by enumerating all possible 
outcomes and verifying that there is no incoherent 
move. In some cases, the number of outcomes can be 
immensely reduced to the order of iV; see Theorem 
1 in the article by Cheung (2005). Third, one could 
enforce coherence by restriction when the model- 
based dose assignment is incoherent. Applying co- 
herence restrictions is common in practice (Faries 
(1994)) and is the most straightforward approach for 
complex designs. On the other hand, the restricted 
moves need to be examined carefully lest they should 
cause an incompatibility problem as defined by Che- 
ung (2005). 

In practice, the enrollment plan is often small- 
group sequential, that is, m > 1, in order to reduce 
the number of interim decisions and hence trial du- 
ration. In this case, each group of subjects may be 
viewed as an experimental unit. A generalized ver- 
sion of Property 1 can be stated as: 

Property 1' (Group coherence). A dose-finding 
design V is said to be group coherent in escalation 
if with probability 1 

(7) P D (U i >0\Y i _ 1 >p) = 

for all i, where Ui = Xi — Xi-\ now denotes the dose 
increment from group i — 1 to i and Yj_i is the ob- 
served proportion of toxicities in group i — 1. Anal- 
ogously, the design is said to be group coherent in 
de-escalation if with probability 1 

(8) P v (U i <0\Y i _ l <p) = 
for all i. 

It is easy to see that (7) and (8) reduce to (5) and 
(6), respectively, when m = 1 for p E (0, 1). 
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2.3.2 Rigidity and sensitivity A design sequence 
{x n } is strongly consistent for 9 if x n — > 6 with prob- 
ability 1. For trials allowing only a discrete number 
of test doses as in (2), consistency means x n = v 
eventually with probability 1. Consistency, a desir- 
able statistical property in general, has an ethical 
connotation in dose-finding studies because it im- 
plies all subjects enrolled after a certain time point 
will be treated at u, which is the desired dose. 

Property 2 (Rigidity). A dose-finding design 
V is said to be rigid if for every < pl < 7r(^) < 
pu < 1 and all n > 1 , 

Pv{x n e l n (p L ,Pu)} < 1, 
where l n (p L ,pu) = {x:p L < n(x) < pu}. 

It is easy to see that consistency excludes the 
rigidity problem. In other words, Property 2 implies 
that a design is inconsistent. In particular, rigid- 
ity occurs when a CRM-like procedure is applied in 
conjunction with nonpar ametric estimation. Hence, 
such a nonparametric design is inconsistent. This is 
quite interesting and somewhat counterintuitive, be- 
cause nonparametric estimation is introduced with 
an intention to remove bias and to enhance the 
prospect of consistency. 

To illustrate, consider a design that starts at dose 
level 1, enrolls subjects in groups of size m = 2, and 
assigns the next group at argminfc|pfc — p\ where 
Pk is an estimate of 7r(c4) based on isotonic re- 
gression, and the target is p = 0.20. Now suppose 
that none of the subjects in the first group has a 
toxic outcome. Then suppose the second group en- 
ters the trial at dose level 2, with one of the two 
experiencing toxicity. Based on these observations, 
the isotonic estimates are f>\ = 0.00 and p~2 = 0.50, 
which bring the trial back to dose level 1. From 
this point on, because there is no parametric ex- 
trapolation to affect the estimation of ir{d2) by the 
data collected at d\, the isotonic estimate p2 will 
be no smaller than 0.50 regardless of what hap- 
pens at d±, that is, \p2 — 0.20| > 0.30. As a result, 
\p! - 0.20| < 0.30 < \p 2 - 0.20| if pi < 0.20. That is, 
the trial will stay at dose level 1 even if there is a 
long string of nontoxic outcomes there! 

This example demonstrates that nonparametric 
estimation and the sequential sampling plan together 
cause rigidity through an "extreme" overestimate of 
n(d2) based on small sample size. The probability of 
this extreme overestimation is nonnegligible indeed: 
if dose level 2 is the true MTD with vr(d 2 ) = 0.20, 



then the probability that the trial is confined to the 
suboptimal dose 1 is at least 0.36 by a simple bino- 
mial calculation. Cheung (2002) constructed a sim- 
ilar numerical example for the Bayesian nonpara- 
metric curve-free method, and suggested that the 
rigidity probability can be reduced by using an in- 
formative prior to add smoothness to the estimation. 

Due to ethical constraints such as coherence and 
the discrete design space, it may be challenging to 
achieve consistency without strong model assump- 
tions. For example, the CRM has been shown to 
be consistent under certain model misspecifications, 
but is not generally so (Shen and O'Quigley (1996)). 
In this context, Cheung and Chappell (2002) intro- 
duced the indifference interval as a sensitivity mea- 
sure of how close a design may approach v on the 
probability scale: 

Property 3 (Indifference interval). The indif- 
ference interval of a dose-finding design T> exists and 
is equal to p ± S if there exist N > and 5 G (0,p) 
such that 

P v {x n e l n (p ~ S,p + S) for all n > N} = 1. 

Apparently, the smaller the half-width S of a de- 
sign's indifference interval is, the closer the design 
converges to the MTD; whereas a large 5 indicates 
the design is sensitive to the underlying ir. The sen- 
sitivity of the design T> can thus be measured by 
S. Specifically, a design with half-width 5 (for some 
5 < p) will be called a ^-sensitive design. 

It is clear that if a design T> is consistent for z/, 
then it is 5-sensitive; that is, one may choose 5 so 
that tt{u) G p ± 5. Also, if T> is 5-sensitive, then it 
is nonrigid. Thus, while consistency appears to be 
too difficult and nonrigidity too nondiscriminatory 
for a dose-finding design, 5-sensitivity seems to be 
a reasonable design property. Cheung and Chappell 
(2002) prescribed a way to calculate the indifference 
interval of the CRM, that is, the CRM is J-sensitive. 
Moreover, Lee and Cheung (2009) showed that the 
CRM can be calibrated to achieve any 5 level of 
sensitivity. However, it should be noted that indif- 
ference interval is an asymptotic criterion. As such, a 
small 5 does not necessarily yield good finite-sample 
properties. 

2.3.3 Unbiasedness The performance of a reason- 
able dose-finding design is expected to improve as 
the underlying dose-toxicity curve tt becomes steep. 
This property, called unbiasedness by Cheung (2007), 
is formulated as follows: 
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Fig. 1. Two dose-toxicity curves under which dose 3 is the 
MTD with p = 0.20. A 8 -sensitive design with 5 — 0.06 will 
eventually select doses 2 or 3 under the shallow curve (curve 
1), but will be consistent for dose 3 under the steep curve 
(curve 2). The horizontal dotted lines indicate the indifference 
interval. 

Property 4 (Unbiasedness) . Let pi = Tv(di) de- 
note the true toxicity probability at dose d{. A de- 
sign D is said to be unbiased if 

(a) Pv{x n = dk) is nonincreasing in p^ for %' <k, 
and 

(b) Px>{x n = dk) is nondecreasing in p. L for i > k. 

For the special case with dk = v and n(y) =p, un- 
biasedness implies that the probability of correctly 
selecting v increases as the doses above the MTD 
become more toxic (i.e., Pi^> p), or the doses below 
less toxic (i.e., Pi> In other words, the design 

will select the true MTD more often as it becomes 
more separated from its neighboring doses in terms 
of toxicity probability. A design that satisfies this 
special case is called weakly unbiased. 

One may argue that (5-sensitive designs (e.g., the 
CRM) are asymptotically weakly unbiased, in that 
they will be consistent if the underlying dose-toxicity 
curve 7r becomes sufficiently steep around the MTD; 
see Figure 1 for an illustration. Unbiasedness has 
been established only for few designs in the dose- 
finding literature; an example is the class of stepwise 
procedures (Cheung (2007)). In practice, extensive 
simulations are usually required, and are often ade- 
quate, to confirm that a design is (weakly) unbiased. 



3. STOCHASTIC APPROXIMATION 

3.1 The Robbins-Monro (1951) Procedure 

Robbins and Monro (1951) introduced the first 
formal stochastic approximation procedure for the 
problem of finding the root of a regression function. 
Precisely, let M(x) be the mean of an outcome vari- 
able Y = Y{x) at level x, and suppose M(x) = a 
has a unique root 9 and sup x E{Y 2 (x)} < oo. Then 
the stochastic approximation recursion approaches 
6 sequentially: 

(9) x i+ i = Xi - — (Yi - a) 

ib 

for some constant b > 0. It is well established that 
x n — > 8 with probability 1. If in addition, the con- 
stant b is chosen properly namely b < 2M'(6) = 2/3, 
then n^ 2 (x n - 9) will converge in distribution to a 
normal variable with mean and variance a 2 {b(2(3 — 
6)} _1 where a 2 = lim^^g var{y(x)}; see the works 
of Sacks (1958) and Wasan (1969). 

It is immediately clear that (9) is applicable to 
address objective (1) in a clinical trial setting with 
M = tt and a = p. For one thing, the recursion out- 
put is coherent (Property 1) thus passing the first 
ethical litmus test. It is also easy to see that a small- 
group sequential version of (9), that is, replace Yi 
with Yi, is group coherent (Property 1'). There are, 
however, several practical considerations. 

The choice of b is crucial. In view of efficiency, 
the asymptotic variance is minimized when we set 
b = f3, which is typically unknown in most applica- 
tions. This leads to the idea of adaptive stochastic 
approximation where b is replaced by a sequence 6j 
that is strongly consistent for [3 (Lai and Robbins 
(1979)). However, when the sample size is small-to- 
moderate, the numerical instability induced by the 
adaptive choice bi may offset its asymptotic advan- 
tage. In this article, for a reason described in Sec- 
tion 4.2, we assume that a good choice of b is avail- 
able. 

The next practical issue is that (9) entails the 
availability of a continuum of doses. This is seldom 
feasible in practice. In drug trials, dose availability 
is often limited by the dosage of a tablet. For treat- 
ments involving combination of drugs administered 
multiple times over a fixed period, each subsequent 
dose may involve increasing doses and/or frequency 
of different drugs. For example, Table 1 describes 
the dose schedules of bortezomib used in a dose- 
finding trial in patients with lymphoma (Leonard 
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Table 1 

Dose schedules of bortezomib used in Leonard et al. (2005) 



Level Dose and schedule within cycle 



1 0.7 mg/m 2 on day 1 of each cycle 

2 0.7 mg/m 2 on days 1 and 8 of each cycle 

3 0.7 mg/m 2 on days 1 and 4 of each cycle 

4 1.0 mg/m 2 on days 1 and 4 of each cycle 

5 1.3 mg/m 2 on days 1 and 4 of each cycle 



et al. (2005)). The first three levels prescribe borte- 
zomib at a fixed dose 0.7 mg/m 2 with increasing fre- 
quency, whereas the next two increments apply the 
same frequency with increasing bortezomib doses. 
While we are certain that the risk for toxicity in- 
creases over each level, there is no natural scale of 
dosage (e.g., mg/m 2 ). Thus, assuming that the toxi- 
city probability ir(x) is well defined on a continuous 
range of x is artificial. 

To tailor the stochastic approximation for the dis- 
crete objective u, an obvious approach is to round 
the output of (9) to its closest dose at each iter- 
ation. For example, suppose that the dose labels 
are {1, . . . ,K}, that is, dj~ = k. Then a discretized 
stochastic approximation may be expressed as 

(10) x i+1 = cix i -^{Y i -p)\ 

where C{x) is the rounded value of x if 0.5 < x < 
K + 0.5, and is set equal to 1 and K respectively if 
x < 0.5 or > i"T + 0.5. Unfortunately, the discretized 
stochastic approximation is rigid (Property 2). To 
illustrate, consider applying (10) with b = 0.2 and 
a target p = 0.20 in a trial with x\ = 1 and m = 2. 
Then no toxicity event in the first group, that is, 
Yi = 0, gives X2 = 2. Further suppose that the sec- 
ond group has a 50% toxicity rate (Y2 = 0.5). This 
will bring the trial back to £3 = C(2 — 0.75) = 1; it is 
easy to see that the remaining subjects will receive 
dose 1. To see how rigidity occurs for a general vari- 
able type, we observe that since X\ is an integer, the 
update Xi + \ according to (10) will stay the same 
clS Xj if \(ib) 1 (Yi — p)\ < 0.5, whose probability ap- 
proaches 1 at a rate of 0(i~ 2 ) according to Cheby- 
shev's inequality if Yi has a finite variance. If Yi is 
bounded (e.g., binary), the term C{{ib)~ l (Yi — p)} 
will always be zero as i becomes sufficiently large, 
and will not contribute to future updates. This prob- 
lem, called discrete barrier, is thus built by round- 
ing and the fact that the design points take on a 
discrete set of levels. In the context of the CRM, 



Shen and O'Quigley (1996) pointed out similar dif- 
ficulties in establishing the theoretical properties of 
dose-finding methods due to the discrete barrier. 
This is where the modern dose-finding literature de- 
parts from the elegant stochastic approximation ap- 
proach. 

3.2 Stochastic Approximation and Model-Based 
Methods 

The Robbins-Monro stochastic approximation is 
a nonparametric procedure in that the convergence 
results depend only very weakly on the true under- 
lying M{x). For the case of normal Y, interestingly, 
Lai and Robbins (1979) showed that the recursion 
output in (9) is identical to the solution Xj+i of 

i 

(11) *£ / Y j -{a + b(x j -x i+1 )} = 

3=1 

which amounts to maximum likelihood estimation of 
8 under a simple linear regression model. This con- 
nection between the stochastic approximation and 
a model-based approach motivates the study of the 
maximum likelihood recursion in the work of Wu 
(1985), Wu (1986), and Ying and Wu (1997) for data 
arising from the exponential family. In particular, 
for binary Y, Wu (1985) proposed the logit-MLE 
that uses the logistic working model 

F(x,6)= P**PW x Z e)} 

1 — p + pexp{6(x — 9)} 

(12) 

for some b > 

and replaces the estimating equation (11) with 
S}=i{^j ~~ — 0. Here, we focus on the 

nonadaptive version, that is, where b is a fixed con- 
stant. A maximum likelihood version of the CRM 
(4) would clearly yield the same design point as Xi+i 
if the design space was continuous. In this regard, 
the likelihood CRM is an analogue of the logit-MLE 
for the discrete objective v. 

In order to establish the asymptotic distribution 
of the logit-MLE (and the maximum likelihood re- 
cursion in general), Ying and Wu (1997) showed 
that the sequence Xj+i is asymptotically equivalent 
to an adaptive Robbins-Monro recursion; see the 
proof of Theorem 3 in the article by Ying and Wu 
(1997). While the justification of the model-based 
logit-MLE relies on its asymptotic equivalence to 
the nonparametric Robbins-Monro procedure, Wu 
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(1985) showed by simulation that the former is su- 
perior to the latter in finite-sample settings with bi- 
nary data. Similarly, O'Quigley and Chevret (1991) 
demonstrated that the CRM performs better than 
the discretized stochastic approximation (10) for the 
objective v. 

These observations regarding the stochastic ap- 
proximation, the logit-MLE, and the CRM bear two 
practical suggestions. First, in typical dose-finding 
trial settings with binary data and small sample 
sizes, a model-based approach seems to retain some 
information that is otherwise lost when using non- 
parametric procedures. This speculation is made 
without assuming much confidence about the work- 
ing model. Second, one may study the theoretical 
(asymptotic) properties of the modern model-based 
method (e.g., CRM) by tapping the rich stochastic 
approximation literature, thus giving guidance on 
the choice of design parameters such as b in (12). 
This can be achieved, of course, only if we can re- 
solve the discrete barrier — we will return to this in 
Section 4.2. 

3.3 Stochastic Approximation and Adaptive 
Control 

Maximum likelihood recursion attempts to opti- 
mize the prospect for the next subject by setting the 
next design point at the current estimate of 6, and is 
myopic in that it does not consider the dose assign- 
ments of future subjects. The Robbins-Monro pro- 
cedure is therefore myopic by (asymptotic) equiva- 
lence. Lai and Robbins (1979) studied the adaptive 
cost control aspect of the stochastic approximation 
for normal Y where C n = ^r=i( x « — ^ s defined 
as the cost of a design sequence {xi} at stage n. 
Specifically, they showed that the cost of (9) is of 
the order a 2 log n if b < 2/3. Under some simple lin- 
ear regression models, Han, Lai, and Spivakovsky 
(2006) showed that the myopic Bayesian rule is op- 
timal when the slope parameter is known. This sug- 
gests that the myopic Robbins-Monro method may 
also have good adaptive control properties. 

The control aspect of the stochastic approxima- 
tion is less clear for binary data. Bartroff and Lai 
(2010) addressed the control problem by using tech- 
niques in approximate dynamic programming to min- 
imize some well-defined global risk, such as the ex- 
pectation of the design cost C n . The authors demon- 
strated reduction of the global risk by nonmyopic 



approaches when compared to the myopic ones in- 
cluding the stochastic approximation and the logit- 
MLE. The scope of the simulations, however, is con- 
fined to situations where the logistic model correctly 
specifies tt. In addition, their approach is intended 
for the continuous objective 6, instead of v. 

Further research on the use of nonmyopic 
approaches in dose-finding is warranted, especially 
for practical situations with a discrete set of test 
doses. The design cost at stage n for the discrete ob- 
jective v can be analogously defined as C' n = 
Y^i=ii x i ~ u ) 2 - Then a dose-finding design V is con- 
sistent if and only if lim^^oo C' n is finite almost ev- 
erywhere. As mentioned earlier, the myopic CRM is 
not necessarily consistent (as it tries to treat each 
subject at the current "best" dose). By contrast, 
designs that spread out the design points (e.g., the 
biased coin design) allow consistent estimation of 
v at the expense of the enrolled subjects. Neither 
guarantees a finite lim^C^. An optimal T> for the 
infinite-horizon control of C' n thus seems to resolve 
the inherent tension between the welfare of enrolled 
subjects (i.e., the cost is kept low) and the estima- 
tion of v (i.e., x n is consistent). 

4. ONGOING RELEVANCE 
4.1 Binary Versus Dichotomized Data 

As mentioned earlier, with a binary outcome and 
small samples, the Robbins-Monro procedure is gen- 
erally less efficient than model-based methods, and 
hence may not be suitable for clinical dose-finding 
where the study endpoint is classified as toxic and 
nontoxic. In many situations, however, the binary 
toxic outcome T is defined by dichotomizing an ob- 
servable biomarker expression Y, namely, T = 1(Y > 
to) for some fixed safety threshold to, where l(-E') de- 
notes the indicator of the event E. The biomarker 
Y apparently contains more information than the 
dichotomized T, and may be used to achieve the 
dose-finding objective (1) with greater efficiency. 

To illustrate, consider the regression model 

(13) Y = M(x) + a(x)e, 

where e has a known distribution G with mean 
and variance 1. Under (13), the toxicity probability 
can be expressed as n(x) = 1 — G[{to — M(x)}/a(x)] 
and the continuous dose-finding objective (1) can be 
shown to be equivalent to the solution to 

(14) f(x) = M{x)+z p a(x)=t , 
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where z p is the upper pth percentile of G. To focus 
on the comparison between the use of Y and T, 
suppose for the moment that a continuum of dose 
x is available. Further suppose that a trial enrolls 
patients in small groups of size m. Let x% denote the 
dose given to the ith group, and Y^j the biomarker 
expression of the jth subject in the group. With this 
experimental setup, we note that 



(15) 



Oi 



Y l + [E{S l /a(x l )}]~ 1 z p S i 



is an unbiased realization of f(xi), where Si is the 
sample standard deviation of the observations in 
group i. The expectation in (15) can be computed 
for any given G, because Si/a(xi) depends on the 
error variable e but not M and a under model (13). 
In other words, Oj is observable and is a continuous 
variable that can be used to generate a stochastic 
approximation recursion 



(16) 



Xi+l 



xi-iiby^Oi-to). 



The design {x n } generated by (16) is consistent for 
9 under the condition that 9 is the unique solution 
to (14). This condition holds, for example, when M 
is strictly increasing and a is nondecreasing in x. 
This is a reasonable assumption for many biological 
measurements, for which the variability typically in- 
creases with the mean. Furthermore, if b < 2/3, where 
ft = f'{9) here, then the asymptotic variance of x n 
is vo = linxj.^ var(Oj){6(2/3 — b)} . In particular, 
when e is standard normal, 



where 



vo 



A, 



a 2 {9){l + mz 2 J\ m -l)} 



mb(2/3 - b) 



(m- l)T 2 {(m- l)/2} 



2r 2 (?n/2) 



Now, instead of using the recursion (16), suppose 
that we apply the logit-MLE based on the 
dichotomized outcomes by solving ^}=i{^j ~~ 
F(xj,Xi + \)} = where F is defined in (12). Then us- 
ing the results in the article of Ying and Wu (1997), 
we can show that y/n(x n — 6) converges in distribu- 
tion to a mean zero normal with variance vt = p(l — 
p){rnb(2j3 - b)}' 1 where (3 = tt'{9) = (3G'{z p )/a{9). 

The asymptotic variances of v<j and vt are mini- 
mized when b = f3 and b = (3, respectively. Thus, the 
optimal choice depends on unknown parameters. For 
the purpose of comparing efficiencies, suppose we 




Fig. 2. Asymptotic efficiency of x n based on recursion (16) 
relative to the logit-MLE x n . 

could set b and b to their respective optimal values. 
Then the variance ratio is equal to 

n? N ^T = P(l -P) 

1 ; vo {G>{z p )¥{l + mzl(\ m -l)} 

for normal noise, and also represents the asymptotic 
efficiency of x n relative to x n . For m = 3, the ratio 

(17) attains a minimum of 1.238 when p = 0.12 or 
0.88. As shown in Figure 2, the efficiency gain can 
be substantial for any group sizes larger than 2, es- 
pecially when the target p is extreme. 

4.2 Virtual Observations 

A particular obstacle to the use of stochastic ap- 
proximation is the discrete design space used in clin- 
ical studies, which creates the discrete barrier (Sec- 
tion 3.1). To overcome the discrete barrier, Cheung 
and Elkind (2010) introduced the notion of virtual 
observations. Precisely, the virtual observation of 
the zth group of subjects is defined as 

(18) V l = O l + b(x*-x l ), 

where x* denotes the assigned dose of the group 
which can take values on a continuous conceptual 
scale that represents an ordering of doses. In the 
situations where the actual given dose Xi can take 
on any real value, we have x* = X{ and Vi = Oi, and 
thus, the recursion (16) may be used to approach the 
target dose 9. When X{ is confined to {1,. .. ,K}, 
Cheung and Elkind (2010) proposed generating a 
stochastic approximation recursion based on the vir- 
tual observations: 



(19) 



J i+1 



-r(Vi-to), 
ib 
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and treating the next group of subjects at Xi+i = 
C(x* +1 ). To initiate the virtual observation recur- 
sion, one may set x\ = x\ € {1, . . . , K}. 

Cheung and Elkind (2010) proved, under mild con- 
ditions, that x* +1 generated by (19) is consistent 
(hence nonrigid) for Of, for some 9b = v ± 0.5, and 
hence X{+\ for v. Briefly, for any given b, consistency 
will occur if the neighboring doses of the MTD are 
sufficiently apart from the MTD in terms of toxicity 
probability. This is in essence asymptotically weakly 
unbiased as defined in Section 2.3.3, and can be eas- 
ily derived from Propositions 2 and 3 of Cheung and 
Elkind (2010). 

With the use of continuous V's, the notion of co- 
herence needs to be re-examined. In particular, the 
virtual observation recursion (19) will de-escalate if 
the biomarker expression of the current subjects has 
a high average (Yj) or a large variability (Si). This 
is a sensible dose-escalation principle for situations 
where the variability increases proportionally to the 
mean. 

The idea of virtual observation is to create an ob- 
jective function 

h(x) = E(Vi\x* = x) = f{C(x)} + b{x - C(x)} 

that is defined on the real line, and has a local slope 
at {1, . . . , K}, such that the solution 9 of (14) can be 
approximated by the solution 9b of h(x) = to- Quite 
importantly, since now the objective function h has 
a known slope b around 9b (under some Lipschitz- 
type regularity conditions), we can use the same b 
in the recursion (19) as in the definition of virtual 
observations (18). This design feature enables us to 
achieve optimal asymptotic variance without resort- 
ing to adaptive estimation of the slope of the ob- 
jective function. It is particularly relevant to early 
phase dose-finding studies where adaptive stochastic 
approximation can be unstable due to small sample 
sizes. 

5. LOOKING TO THE FUTURE 

Statistical methodology for dose-finding trials is 
by its nature an application-oriented discipline. Con- 
sequently, much of the emphasis in the dose-finding 
literature has been on empirical properties via simu- 
lation. However, as the (model-based) methods be- 
come increasingly complicated, it is imperative to 
check their properties against some theoretical cri- 
teria so as to avoid pathological behaviors that may 
not be detected in aggregate via simulations; rather, 



pathologies such as incoherence and rigidity are point- 
wise properties that can be found by careful an- 
alytical study. As a case in point, the virtual ob- 
servation recursion (19) is presented in light of the 
properties described in Section 2.3. Granted, as the 
data content becomes richer, these theoretical crite- 
ria have to be re-examined. Cheung (2010), in an- 
other instance, extended the notion of coherence for 
bivariate dose-finding in the context of phase I/II 
trials — see the article by Thall (2010) for a review 
of the bivariate dose-finding objective — and showed 
how coherence can be used to simplify dose decisions 
in the complex "black-box" approach of the bivari- 
ate model-based methods, and to provide clinically 
sensible rules. 

The idea of virtual observation bridges the stochas- 
tic approximation and the modern (model-based) 
dose-finding literatures. As the Robbins-Monro 
method has motivated a large number of extensions 
and refinements for a wide variety of root-finding ob- 
jectives, there exists a reservoir of ideas from which 
we can borrow and apply to dose-finding methods 
for specialized clinical situations. To name a few, 
consult the works of Kiefer and Wolfowitz (1952) 
for finding the maximum of a regression function, 
and Blum (1954) for multivariate contour-finding. 
While studying the analytical properties of model- 
based designs in these specialized situations can be 
difficult, connection to the theory-rich stochastic ap- 
proximation procedures allows us to do so with rel- 
ative ease and elegance, as is the case for the virtual 
observation recursion (19). In this light, extending 
the idea of virtual observations for data types other 
than continuous and multivariate data appears to 
be a promising "crosswalk" that warrants further 
research. 
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